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Simulated annealing (SA) is a generic probabilistic meta-algorithm for the global optimization problem, namely 
V;. ? J?5*! n 9 a 9pod approximation to the global optimum of a given functionin a laige search space It was """"" 
, independently invented by S. Kirkpatrick, C. D. Gelatt and M. P. Vecchi in 1983, and by V. G^rny in 1985 

4,.. 

The name and inspiration come from anneafing in metallurgy, a technique involving heating and controlled 
cooling of a material to increase the size of its crystals and reduce their defects. The heat causes the atoms to 
become unstuck from their initiaJ positions (a local minimum of the internal energy) and wander randomly 
through states of higher energy; the slow cooling gives them more chances of finding configurations with lower 
internal energy than the initial one. 



Overview 



In the simulated annealing (SA) method, each point s of the search space rs compared to a sfafe of some 
physical system, and the function E(s) to be minimized is interpreted as the internal energy of the system in that 
state. Therefore the goal is to bring the system, from an arbitrary initial state, to a state with the minimum 
possible energy. 



The basic iteration 



At each step, the SA heuristic considers some neighbours of the current state s, and probabilistically decides 
between moving the system to state s* or staying put in state s. The probabirrtjes are chosen so that the system 
uftrmately tends to move to states of fower energy. Typically this step is repeated until the system reaches a 
state which is good enough for the application, or until a given computation budget has been exhausted. 

The neighbours of a state 



The neighbours of each state are specified by the user, usually in an application-specific way. For example in 
the traveling salesman problem, each state is typically defined as a particular tour (a permutation of the cities to 
be visited); then one could define two tours to be neighbours if and only if one can be converted to the other by 
interchanging a pair of adjacent cities. 

Transition probabilities 
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The probability of making the transition to the new state s' is a function P(6E t 7) of the energy difference 6E~ E 
($' ) - E(s) between the two states, and of a global time-varying parameter T catted the temperature. 

One essential feature of the SA method is that the transition probability P is defined to be nonzero when 5E is 
positive, meaning that the system may move to the new state even when it is worse (has a higher energy) than 
the current one. It is this feature that prevents the method from becoming stuck in a local minimum — a state 
whose energy is far from being minimum, but is still less than that of any neighbour. 

Also, when the temperature tends to zero and 6E is positive, the probability /=X6E, 7) tends to zero. Therefore, 
for sufficiently small values T, the system will increasingly favor moves that go "downhill" (to lower energy 
values), and avoid those that go "uphill 11 . In particular, when T Is 0, the procedure reduces to the greedy 
algorithm — which makes the move if and only if it goes downhill. 

Also, an important property of the P function is that the probability of accepting a move decreases when 
(positive) 5E grows bigger. For any two moves that both have positive 5E values the P function favours the 
smaller val ue (smaller loss). 

When 6E is negative, P{5E 1 7) = 1 . However, some implementations of the algorithm do not guarantee this 
property with the P function, but rather they explicitly check whether 5E is negative, in which case the move Is 
accepted. 

Obviously, the effect of the state energies on the system's evolution depend crucially on the temperature. 
Roughly speaking, the evolution is sensitive only to coarser energy variations when T is large, and to finer 
variations when 7 is small. 

The annealing schedule 



Another essential feature of the SA method is that the temperature is gradually reduced as the simulation 
proceeds. Initially, T is set to a high value (or infinity), and it is decreased at each step according to some 
annealing schedule — which may be specified by the user, but must end with 7=0 towards the end of the 
allotted time budget In this way, the system is expected to wander initially towards a broad region of the search 
space containing good solutions, ignoring small features of the energy function; then drift towards low-energy 
regions that become narrower and narrower; and finally move downhill according to the steepest descent 
heuristic. 

Convergence to optimum 



It can be shown that, for any given finite problem, the probability that the simulated annealing algorithm 
terminates with the global optimal solution approaches 1 as the annealing schedule is extended. This 
theoretical result is, however, not particularly helpful, since the annealing time required to ensure a significant 
probabifity of success will usually exceed the time required for a complete search of the solution space. 

Pseudo-code 



The following pseudo-code implements the simulated annealing heuristic, as described above, starting from 
state sO and continuing to a maximum of kmax steps or until a state with energy emax or less rs found. The call 
neighbours) should generate a randomly chosen neighbour of a given state S; the call random() should return 
a random value in the range [0, 1). The annealing schedule is defined by the call temp(r), which should yield 
the temperature to use, given the fraction r of the time budget that has been expended so tar. 



s 



:= sO 
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e := E(s) 
k := 0 

while k < kmax and e > emax 
sn := neighbour (s) 
en E(sn) 

if en < e or randomO < P(en - e, temp (k/ kmax) ) then 
s := sn; e := en 
k := k + 1 
return s 



Selecting the parameters 



In order to apply the SA method to a specific problem, one must specify the state space, the neighbour 
selection method (which enumerates the candidates for the next state s' ), the probability transition function 
and the annealing schedule. These choices can have a significant impact on the method's effectiveness 
Unfortunately, there are no choices that will be good for all problems, and there Is no general way to find the 
best choices for a given problem. It has been observed that applying the SA method is more of an art than a 



State neighbours 



The neighbour selection method is particularly critical. The method may be modeled as a search graph — 
where the states are vertices, and there is an edge from each state to each of its neighbours. Roughly 
speaking, it must be possible to go from the initial state to a "good enough" state by a relatively short path on 
this graph, and such a path must be as likely as possible to be followed by the SA iteration. 

In practice, one tries to achieve this criterion by using a search graph where the neighbours of s are expected 
to have about the same energy as s. Thus, in the traveling salesman problem above, generating the neighbour 
by swapping two adjacent cities is expected to be more effective than swapping two arbitrary cities. It is true 
that reaching the goal can always be done with only n-1 general swaps, while it may take as many as n(n-1)f2 
adjacent swaps. However, if one were to apply a random general swap to a fairly good solution, one woufd 
almost certainly get a large energy increase; whereas swapping two adjacent cities is likely to have a smaller 
effect on the energy. 

Transition probabilities 



The transition probabifity function P is not as critical as the neighbourhood graph, provided that it follows the 
general requirements of the SA method stated before. Since the probabilities depend on the temperature 7, in 
practice the same probabifity function is used for all problems, and the annealing schedule is adjusted 
accordingly. J 



The "classical" formula 



In the ongmal formulation of the method by Kirkpatric et a/, the transition probability P(5E, T) was defined as 1 
if 6E < 0 (i.e., downhill moves were always performed); otherwise, the probability would be This formula 
oomes from the Metropolis-Hastings algorithm, used here to generate samples from the Maxwell-Boltzmann 
distribution governing the distribution of energies of molecules in a gas. Other transition rules can be used, 
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also. 

Annealing schedule 



f ™ ea,mg s****" 1 © is less critical than the neighbourhood function, but still must be chosen with care The 
' nitl ; »HP©rature must be large enough to make the uphill and downhill transition probabilities about the same 
to do that, one must have some estimate of the value of CEfor a random state and its neighbours. 

T? 8 - temperature must then decrease so that it is zero, or nearly zero, at the end of the ailoted time. A popular 
choice is the exponential schedule, where the temperature decreases by a fixed factor a < 1 at each step 
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